Limit number of allocation explanations in `shards_availability` health indicator #136060

nielsbauman · 2025-10-06T20:54:25Z

We currently compute the shard allocation explanation for every unassigned shard (primaries and replicas) in the health report API when verbose is true, which includes the periodic health logs. Computing the shard allocation explanation of a shard is quite expensive in large clusters. Therefore, when there are lots of unassigned shards, ShardsAvailabilityHealthIndicatorService can take a long time to complete - we've seen cases of 2 minutes with 40k unassigned shards.

To avoid the runtime of ShardsAvailabilityHealthIndicatorService scaling linearly with the number of unassigned shards (times the size of the cluster), we limit the number of allocation explanations we compute to maxAffectedResourcesCount, which comes from the size parameter of the _health_report API and currently defaults to 1000 - a follow-up PR will address the high default size. This significantly reduces the runtime of this health indicator and avoids the periodic health logs from overlapping.

A downside of this change is that the returned list of diagnoses may be incomplete. For example, if the size parameter is set to 10, and the first 10 shards are unassigned due to reason X and the remaining unassigned shards due to reason Y, only reason X will be returned in the health API. We accept this downside as we expect that there are generally not many different diagnoses relevant - if more than size shards are unassigned, they're likely all unassigned due to the same reason. Users can always increase size and/or manually call the allocation explain API to get more detailed information.

…th indicator We currently compute the shard allocation explanation for every unassigned shard (primaries and replicas) in the health report API when `verbose` is `true`, which includes the periodic health logs. Computing the shard allocation explanation of a shard is quite expensive in large clusters. Therefore, when there are lots of unassigned shards, `ShardsAvailabilityHealthIndicatorService` can take a long time to complete - we've seen cases of 2 minutes with 40k unassigned shards. To avoid the runtime of `ShardsAvailabilityHealthIndicatorService` scaling linearly with the number of unassigned shards (times the size of the cluster), we limit the number of allocation explanations we compute to `maxAffectedResourcesCount`, which comes from the `size` parameter of the `_health_report` API and currently defaults to `1000` - a follow-up PR will address the high default size. This significantly reduces the runtime of this health indicator and avoids the periodic health logs from overlapping. A downside of this change is that the returned list of diagnoses may be incomplete. For example, if the `size` parameter is set to `10`, and the first 10 shards are unassigned due to reason `X` and the remaining unassigned shards due to reason `Y`, only reason `X` will be returned in the health API. We accept this downside as we expect that there are generally not many different diagnoses relevant - if more than `size` shards are unassigned, they're likely all unassigned due to the same reason. Users can always increase `size` and/or manually call the allocation explain API to get more detailed information.

elasticsearchmachine · 2025-10-06T20:54:49Z

Pinging @elastic/es-data-management (Team:Data Management)

elasticsearchmachine · 2025-10-06T20:55:12Z

Hi @nielsbauman, I've created a changelog YAML for you.

nielsbauman · 2025-10-06T21:02:53Z

...lasticsearch/cluster/routing/allocation/shards/ShardsAvailabilityHealthIndicatorService.java

+                        // Computing the diagnosis can be very expensive in large clusters, so we limit the number of
+                        // computations to the maxAffectedResourcesCount. The main negative side effect of this is that
+                        // we might miss some diagnoses. We are willing to take this risk, and users can always
+                        // use the allocation explain API for more details or increase the maxAffectedResourcesCount.
+                        // Since we have two `SharAllocationCounts` instances (primaries and replicas), we technically
+                        // do 2 * maxAffectedResourcesCount computations, but the added complexity of accurately
+                        // limiting the number of calls doesn't outweigh the benefits, as the main goal is to limit
+                        // the number of computations to a constant rather than a number that grows with the cluster size.
+                        if (verbose && unassigned <= maxAffectedResourcesCount) {


Should we clarify any of this in the documentation of the API? I'm inclined to say no, but wanted to bring it up to see if others feel differently.

nielsbauman added >enhancement :Data Management/Health labels Oct 6, 2025

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Oct 6, 2025

elasticsearchmachine added the v9.3.0 label Oct 6, 2025

Update docs/changelog/136060.yaml

b8eb086

nielsbauman commented Oct 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Limit number of allocation explanations in `shards_availability` health indicator #136060

Limit number of allocation explanations in `shards_availability` health indicator #136060

nielsbauman commented Oct 6, 2025

Uh oh!

elasticsearchmachine commented Oct 6, 2025

Uh oh!

elasticsearchmachine commented Oct 6, 2025

Uh oh!

nielsbauman Oct 6, 2025

Uh oh!

Uh oh!

Limit number of allocation explanations in shards_availability health indicator #136060

Are you sure you want to change the base?

Limit number of allocation explanations in shards_availability health indicator #136060

Conversation

nielsbauman commented Oct 6, 2025

Uh oh!

elasticsearchmachine commented Oct 6, 2025

Uh oh!

elasticsearchmachine commented Oct 6, 2025

Uh oh!

nielsbauman Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Limit number of allocation explanations in `shards_availability` health indicator #136060

Limit number of allocation explanations in `shards_availability` health indicator #136060